CONVERGENCE OF FIXED POINT CONTINUATION ALGORITHMS FOR MATRIX 

RANK MINIMIZATION 



DONALD GOLDFARB* AND SHIQIAN MA * 
June 18, 2009. Revised Aug. 05, 2009. Revised June 28, 2010. 

Abstract. The matrix rank minimization problem has apphcations in many fields such as system identification, optimal 
control, low-dimensional embedding etc. As this problem is NP-hard in general, its tightest convex relaxation, the nuclear 
norm minimization problem is often solved instead. Recently, Ma, Goldfarb and Chen proposed a fixed point continuation 
algorithm for solving the nuclear norm minimization problem 1331 . By incorporating an approximate singular value decompo- 
sition technique in this algorithm, the solution to the matrix rank minimization problem is usually obtained. In this paper, 
we study the convergence / recoverability properties of the fixed point continuation algorithm and its variants for matrix rank 
minimization. Heuristics for determining the rank of the matrix when its true rank is not known are also proposed. Some 
of these algorithms are closely related to greedy algorithms in compressed sensing. Numerical results for these algorithms for 
solving affinely constrained matrix rank minimization problems are reported. 
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1. Introduction. In this paper, we are interested in the affinely constrained matrix rank minimization 
(MRM) problem, which can be cast as 



(1.1) 



min rank(X) 
s.t. AiX) = b, 



where X G M™^", 6 G and A : M™^" h-> Rp is a linear map. Without loss of generality, we assume that 
m < n throughout this paper. 

Problem p.l|) has applications in many fields such as system identification [32], optimal control [20l 
[Tnilll], and low-dimensional embedding in Euclidean space [30], etc. For example, consider the problem of 
designing a low-order discrete-time controller for a plant, so that the step response of the combined controller 
and plant lies within specified bounds. Suppose the plant impulse response is h{t)^ t = 0, . . . , A^, the controller 
impulse response is x(t), t =: 0, . . . , N , and u{t) ^ l,t ~ 0, . . . , N is the step input. Then finding a low-order 
system is equivalent to solving the following problem: 

, , min ia.nk{H{x)) 

s.t. < (/i*a;*w)(t) <6„(t),i = 0, ...,iV, 

where bi and 6„ are given lower and upper bounds on the step response, * denotes the convolution operator, 
and Hix) is the Hankel matrix (see e.g., [Illl39]). (|1.2p is just a specific application of (jl.ip . 
A special case of (|l.ip is the matrix completion problem: 

min rank(X) 

1.3 ^ ^ 

s.t. x,j = M,j, y{i,j)en. 

This problem has applications in online recommendation system, collaborate filtering [40[ I41j . etc. The 
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famous Nctflix problem |37| is an example of a matrix completion problem. In it users provide ratings to 
some of the movies in a list of movies. Here Mij is the rating given to j-th movie by i-th user. Since users 
only rate a limited number of movies in the list, we only know some of the entries of the matrix M. The goal 
of the Netflix problem is to fill in the missing entries in this matrix. It is commonly believed that only a few 
factors contribute to people's tastes in movies. Thus the matrix A/ will generally be of low-rank. Finding 
this low-rank completion to AI is just the matrix completion problem (jl.3p . 

If X is a diagonal matrix, then (|l.ip becomes the compressed sensing problem [51 [H] : 

(^4) nrin ||x||o 

s.t. Ax — b, 

where A G M™^", b e K™, and ||x||o, which is called the £o norm, counts the number of nonzero elements in 
the vector x. The compressed sensing problem, which is currently of great interest in signal processing, is 
NP-hard [35] . Recent studies in compressed sensing have shown that under certain randomness hypothesis, 
a limited number of measurements is sufRcient to find the optimal solution to (|1.4p by solving a convex 
relaxation of it. The convex envelope of the function ||.t||o on the set {x E M" : ||a;||oo < 1} is the £i norm 
:= J2i EH- Thus the closest convex relaxation of problem (|1.4p is the convex problem: 



(1.5) 



min 1 1 a; 1 1 1 
s.t. Ax = b. 



Many algorithms for solving (|1.4p and (jl.Sp have been proposed. These include greedy algorithms [HI [T31 
m [H [31 in [II 13 for (dl and convex optimization algorithms [3 [H [HI [lH [35] for See [TO] for 

more information on the theory and algorithms for compressed sensing. 

The matrix rank minimization problem (|l.ip is also NP-hard. To get a tractable problem, we can replace 
rank(X) by its convex envelope, the nuclear norm of X [35], as proposed by Fazel et al. [TO]. The nuclear 
norm of X, denoted by is defined as the sum of the nonzero singular values of X; i.e., if the singular 

values of X are ci > (72 > • • • > > Cr+i = . . . = (Jm = 0, then 

r 

\\X\U = J2a^. 

i=l 

Thus, the nuclear norm relaxation of (|l.ip is: 

. , min IIXIL 

1.6 " " 

s.t. A{X) ^ b. 

Let A be the matrix version of A, i.e., A{X) = A ■ vec{X), vec{X) is the vector obtained by stacking 
the columns of the matrix X in a natural order. Recht et al. [35] proved that if the entries of A obey 
some random distribution and the number of measurements p > Cr(rn + n)log(mn), then with very high 
probability, most m x n matrices of rank r can be recovered by solving problem (|1.6p , where C is a positive 
constant; i.e., an optimal solution to p.6p gives an optimal solution to (jl.ip . 

If b is contaminated by noise, then p.6p should be relaxed to 

min IIXIL 

1.7 " " 

s.t. \\A{X)-b\\2<9, 



where > is the noise level. The Lagrangian version of (|1.7p can be written as 
(1.8) mm^l\\X\U + ^\\A{X)-b\\l 

where is a Lagrangian multiplier. 

Several algorithms have been proposed for solving (jl.ip and (|1.6p . Using the fact that (|1.6p is equivalent 
to the semidefinite programming (SDP) problem 



mm 

X,Wi,W2 



(1-9) s.t. 



i(Tr(VKi)+Tr(VK2)) 

Wi x' 

X^ W2 
A{X) = 6, 



^0, 



where Tr(VF) denotes the sum of the diagonal elements of the square matrix W, Recht, Fazel and Parrilo 
[38] and Liu and Vandenberghe [32] proposed interior point methods to solve this SDP. However, these 
interior point methods cannot be used to solve large problems. First-order methods were proposed by Cai, 
Candes and Shen [4] and Ma, Goldfarb and Chen |33j that can solve very large matrix rank minimization 
problems efficiently. One of the algorithms in [33], which is called FPCA (Fixed Point Continuation with 
Approximation SVD), almost always achieves the best recoverability. FPCA can recover m x n matrices 
of rank r using p samples even when r is very close to r,„ax := max{r|r'(m + n — r)/p < 1}. rmax is the 
largest rank of m x n matrices that one can recover with only p samples. In this paper, we study the 
convergence/recoverability properties and numerical performance of FPCA and some of its variants. Our 
main contribution is a weakening of the conditions previously given by Lee and Bresler [28j required for the 
approximate recovery of a low-rank matrix. 

Notation. We use M" to denote the nonnegative orthant of M". We use A* to denote the adjoint 
operator of A. We define the inner product of two matrices X and F G R"^" to be {X,Y) = Tr{X^Y) = 
Tt{Y^X), and denote by \\X\\f = {Tt{X^ X))^/"^ the Frobenius norm of the matrix X and by ||.t||2 the 
Euclidean norm of the vector x. ||X||2 is the spectral norm of the matrix X, which is defined as the largest 
singular value of X. Henceforth, we will denote A{X) by AX as this should not cause any confusion. For 
example. A* AX := A*{A{X)). 

Outline. The rest of this paper is organized as follows. In Section 2 we review the role that the restricted 
isomctry property plays in the theory of compressed sensing and matrix rank minimization. We also present 
three propositions from [28] that provide the bases for theoretical results that we give later in the paper. We 
review the Fixed Point Continuation (FPC) and FPC with Approximate SVD (FPCA) algorithms proposed 
in [53] in Section 3. We then address the first variant of FPCA, which we call iterative hard thresholding 
(IHT). and prove convergence results for it in Section 4. Section 5 is devoted to another variant of FPCA, 
which is called iterative hard thresholding with matrix shrinkage (IHTMS), and convergence results for it. 
We establish convergence/recoverability properties of FPCAr, a very close variant of FPCA, in Section 6. 
Some practical issues regarding numerical difficulties and ways to overcome them are discussed in Section 
7. Finally, we give some numerical results obtained by applying these algorithms to both randomly created 
and real matrix rank minimization problems in Section 8. 



2. Restricted Isometry Property. In compressed sensing and matrix rank minimization, the re- 
stricted isometry property (RIP) of the matrix A or linear operator A plays a key role in the relationship 
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between the original combinatorial problem and its convex relaxation and their optimal solutions. 
The definition of the RIP for matrix rank minimization is: 

Definition 2.1. The linear operator A : M™'^" — > W is said to satisfy the Restricted Isometry Property 
with the restricted isometry constant driA) if Sr{A) is the minimum constant that satisfies 

(2.1) il-Sr{Amx\\l < \\AX\\l < {l + SriAmx\\%, 

for all X e M"""^" with rank(X) < r. Sr{A) is called the RIP constant. Note that 5^ < St, if s<t. 

The RIP concept and the RIP constant 5r {A) play a central role in the theoretical developments of this 
paper. We first note that if the operator A has a nontrivial kernel, i.e., there exists X G ]R™x" such that 
AX = and X 7^ 0, then dn{A) = 1. Second, since the rank of a matrix r must be an integer number, 
the smallest possible value of the restricted isometry constant is 5i{A). Finally, if we represent A in the 
coordinate form {AX)i = Tr{AiX), i ~ 1, . . . ,p, then 6r{A) is related to the joint kernel of the matrices Ai. 
For example, if there exists a matrix X e r™x" with rank r such that AiX = 0, i = 1, . . . ,p, then 6r{A) = 1. 
Our results in this paper do not apply to such a pathological case. 

For matrix rank minimization (jl.l[) . Recht et al. |38| proved the following results. 

Theorem 2.2 (Theorem 3.3 in [5H])- Suppose that rank(X) < r, r > 1 and dsriA) < 0.1. Then 
and (|1.6p have the same optimal solution. 

Theorem 2.3 (Theorem 4.2 in [SB]). Fix S € (0, 1). If A : M™x" Rp is a nearly isometric random 
map (see Definition 4-1 'in WE^), then for every I < r < m, there exist constants Co,Ci > depending only 
on S such that, with probability at least 1 — exp(— cip), dr{A) < S whenever p > CQr{m + n) log(mn). 

Theorems l2.2l and l2.3l indicate that if ^ is a nearly isometric random map, then with very high probability, 
A will satisfy the RIP with a small RIP constant and thus we can solve p.ip by solving its convex relaxation 
(jl.6|) . As an example, if A is the matrix version of the operator A, and its entries A^j are independent, 
identically distributed (i.i.d.) Gaussian, i.e., Aij ^ A/'(0, 1/p), then A is a nearly isometric random map. For 
other nearly isometric random maps, see |38j . 

In Section [8l we will show empirically that when the entries of A are i.i.d. Gaussian, the algorithms 
proposed in this paper can solve the matrix rank minimization problem (|l.ip very well. 

It is worth noticing that the linear map A in the matrix completion problem (jl.3p does not satisfy the 
RIP. A counter example is given in [S]. For theories and algorithms for matrix completion problems, we refer 

to laiiisiiiiiMiiiisiiiiisi]- 

In our proofs of the convergence of FPCA variants, we need A to satisfy the RIP. Before we describe 
some properties of the RIP that we will use in our proofs, we need the following definitions. 

Definition 2.4 (Orthonormal basis of a subspace). Given a set of rank-one matrices = 
{ipi, . . . ,ipr}, there exists a set of orthonormal matrices T ~ {71,..., 7^}, i.e., — 0, for i ^ j 

and ||7i||_F = 1 for all i, such that span{T) = span{"^). We call T an orthonormal basis for the subspace 
span{'^). We use P^X to denote the projection of X onto the subspace span{T). Note that PrX = P^,X and 
rank(PrA) < r,VX G R™^". 

Definition 2.5 (SVD basis of a matrix). Assume that the rank-r matrix X^ has the singular value 
decomposition X^ = X]I=i <^i'^i'^J ■ T := {uivj ,U2V2 , . ■ ■ ,UrvJ} is called an SVD basis for the matrix Xr. 
Note that elements in T are orthogonal unit-norm rank- one matrices. 
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We now list some important properties of linear operators that satisfy RIP. 

Proposition 2.6. Suppose that the linear operator A : M™^" W satisfies the RIP with constant 
6r(A). Let * be an arbitrary orthonormal subset o/M™^" such that rank(P*X) < r,yX £ M™^". Then, 
for all b E MP and X E R™^", the following properties hold: 

(2.2) \\P^A*b\\F < V'^ + W)\\bh 

(2.3) (1 - SriA))\\P^X\\F < \\P4,A*AP^X\\f < (1 + Sr{A))\\P^X\\F 



Proposition 2.7. Suppose that the linear operator A : R™^" satisfies the RIP with constant 

Sr{A). Let vf, v]/' be arbitrary orthonormal subsets o/R™^" such that rank(P-4,u*'-''^) < for any X G R™^". 
Then the following inequality holds 

(2.4) \\P^A*A{I - P^,)X\\f < 5r{A)\\{I - P4,)X\\F,yX e span(4'')- 
Proposition 2.8. If a linear map A : R™^" Rp satisfies 

(2.5) \\AX\\l < (1 + Sr{A))\\X\\l, VX e R™^",rank(X) < r, 
then 



(2.6) \\AX\\2 < v/1 + SriA) ( \\X\\f + ) , yxe 



The proofs of Propositions 12.61 12.71 and 12.81 are given in the Appendix. 

3. FPC Revisited. To describe FPC and FPCA and its variants, we need the following definitions. 

Definition 3.1. Assume that the singular value decomposition of the matrix X e jj^x" jg given by 
X ~ X]I=i o'i'U'ivJ with (Ti > (72 > . . . > CTm- Then the best rank-r approximation Rr{X) to the matrix X is 
defined as 

r 

Rr{X) = ^ OiUivJ . 
i=l 

Rr : R™^" — > ]ij"ix" ig also called the hard threholding/ shrinkage operator with threshold r. 

Definition 3.2. Assume the SVD of the matrix X is given by X ^ UDiag{a)V^ . For v > 0, the 
matrix shrinkage operator Si,{X) is defined as: 

S^{X) = UDiagiia - iy)+)V^ , 

where is defined as := max(a,0). This matrix shrinkage operator is also called the soft shrinkage 
operator with threshold v. 

FPC, whose development was motivated by the work on li regularized problems in [3T], is based on 
applying an operator splitting technique to the optimality conditions for (|1.8p . Note that X* is the optimal 

^ Propositions 12.61 and 12.81 were first proposed by Lee and Bresler without proof in 1281 . Proofs of Propositions 12.61 and 12.81 
were provided later in [27) . 
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solution to (|1.8|) if and only if 



(3.1) 0£fid\\X*\\,+g{X*), 

where g{X*) = A*{AX* — b) is the gradient of the least squares term — &II2, and (9||X*||* is the 

subgradient of the nuclear norm ||X*||* of X*. According to [3], the subgradient of is given by 

(3.2) = {UV^ + W : U'^W = 0, WV = 0, < 1}, 

where the SVD of X is given hy X = J7Diag(cr)l^^, [7 G R'"^^^!/ e M'^^^cr G W^. 

Now based on the optimality condition p.ip . we can develop a fixed point iterative scheme for solving 
by adopting an operator splitting technique. Note that is equivalent to 



(3.3) e T^id\\X*\\^ +X* - {X* - Tg{X*)) 
for any t > 0. If we let 

Y* =X* - Tg{X*), 

then p.3p can be rewritten as 

(3.4) eTfid\\X*\\,+X* -Y*, 
i.e., X* is the optimal solution to 

(3.5) min Tfi\\X\\, + hx - Y*\\%. 

It is known that Srfi{Y*) gives the optimal solution to p.Sp 1331 . Hence, the following fixed point 
iterative scheme can be given for solving (jl.Sp : 



(3.6) 



X''+^ = 5,^(r^+i). 



The following convergence result is proved in |33j . 

Theorem 3.3 (Theorem 4 in [53). Assume r e (0, 2/Ama2.(yl*yl)), w/iere A„iax(-4*y4,)) denotes the 
largest eigenvalue of A* A. The sequence {X''} generated by the fixed point iterations (|3.6p converges to 
some X* e X* , where X* is the optimal set of problem (jl.Sp . 

Note that in every iteration of p.6p . an SVD has to be computed to perform the matrix shrinkage 
operation, which is very expensive. Consequently, FPCA uses an approximate SVD to replace the whole 
SVD, i.e., it computes only a rank-r approximation to Y^^^ . Note that there are many ways to get a rank-r 
approximation to Y^'^^ . Here we assume that the best rank-r approximation Rr{Y^^^) is used. In Scction[7l 
we discuss a Monte Carlo method to approximately compute R^iY^^^) since computing i?,,(y'^'+^) exactly 
is still expensive if r is not very small and the matrices are large. By adopting a continuation strategy for 
the parameter ^ in (|3.6p . we arrive at the following FPCA algorithm (Algorithm [1]) as proposed in [33] . 

We can see that FPCA adopts three techniques, hard thresholding, soft shrinkage and continuation. 
These three techniques have different properties which, when combined, produce a very robust and efficient 

6 



Algorithm 1: Fixed Point Continuation with Approximate SVD for MRM (FPCA) 



Initialization: Set X := A°. 

for /I = ^1, /i2, /iL = p. do 
while not converged do 
Y := X -TA*iAX -b). 
choose r. 

X Sr^,{RriY)). 



algorithm with great recoverabihty properties. By using only one or two of these three techniques, we get 
different variants of FPCA. We will study two of these variants, Iterative Hard Thresholding (IHT) and 
Iterative Hard Thresholding with soft Matrix Shrinkage (IHTMS) in Sections 2] and [SJ respectively, and 
FPCA with given rank r (FPCAr) in Section H 

In the following sections, we assume that the rank r of the optimal solution is given and we compute 
the best rank-r approximation to Y in each iteration. In Section [71 we give a heuristic for choosing r in 
each iteration if r is unknown and use the fast Monte Carlo algorithm proposed in |15j to compute a rank-r 
approximation to Y. 

4. Iterative Hard Thresholding. In this section, we study a variant of FPCA that we call Iterative 
Hard Thresholding (IHT) because of its similarity to the algorithm in [2] for compressed sensing. 

If in FPCA, we assume that the rank r is given, we do not do any continuation or soft shrinkage, and 
always choose the stepsize r equal to one, then FPCA becomes Algorithm [2] (IHT). At each iteration of 
IHT, we first perform a gradient step Y^^^ := X'' — A*{AX'' — b), and then apply hard thresholding to the 
singular values of i.e., we only keep the largest r singular values of Y'''^^, to get X''^^. 



Algorithm 2: Iterative Hard Thresholding (IHT) 

Initialization: Given A°,r. 
for k ~ 0,1,. . . do 

yk+l - A*{AX^ - b). 

_ A'^+i :=ii,(y'^-+i) 



As previously mentioned, IHT is closely related to an algorithm proposed by Blumensath and Davies [2] 
for compressed sensing. Their algorithm for solving (|1.4p performs the following iterative scheme: 



(4.1) 



yk+l ^ j.k _ Tg{x^) 
X^+l =iJ,(yfe+l), 



where g{x^) = [Ax'^ — b), Hr{y) is the hard thresholding operator that sets all but the largest (in 
magnitude) r elements of y to zero. Clearly, IHT for matrix rank minimization and compressed sensing are 
the same except that the shrinkage operator in the matrix case is applied to the singular values while in the 
compressed sensing case it is applied to the solution vector. 

To prove the convergence/recoverability properties of IHT for matrix rank minimization, we need the 
following lemma. 

Lemma 4.1. Suppose X := Rr{Y) is the best rank-r approximation to the matrix Y , and T is an SVD 
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basis of X . Then for any rank-r matrix and SVD basis Tr of Xr, we have 

(4.2) \\PbX - PbY\\f < WPsXr - PbY\\f, 

where B is any orthonormal set of matrices satisfying span{T U Tr) C span{B). 

Proof. Since X is the best rank-r approximation to Y and rank{Xr) = r, \\X — Y\\p < \\Xj. — Y\\p. 
Hence, 

WPsiX - Y)\\l + 11(7 - PB)iX Y)\\% < WPsiXr - Y)\\l + 11(7 - Pb^X^ - Y)\\l. 

Since (J - Pb)X = and (J - Pb)^^ = 0, this reduces to (g^)- □ 

For IHT, we have the following convergence results, whose proofs essentially follow those given by 
Bluniensath and Davies [2] for IHT for compressed sensing. Our first result considers the case where the 
desired solution Xr satisfies a perturbed linear system of equations AXr + e = b. 

Theorem 4.2. Suppose that b = AXr + e, where Xr is a rank-r matrix, and A has the RIP with 
SsriA) < 1/V32. Then, at iteration k, IHT will recover an approximation X^ satisfying 

(4.3) \\Xr - < 2-^ \\Xr - X^Wp + 4.34|le|l2. 

Furthermore, after at most k* [log2 — '^^\\f /II^IIs)] iterations, IHT estimates X^ with accuracy 



(4.4) 



Xr — X 



F 

k' 



< 5.34||e||2. 

F 



Proof. Let Tr and F*^ denote SVD bases oi Xr and X'', respectively, and denote an orthonormal basis 
of the subspace span(Fr U F*^'). Let Z'' Xr — X'^ denote the residual at iteration k. Since PB^+i-^r = Xr 
and Pb^^^X^^"^ = X''^^, it follows first from the triangle inequality and then from Lemma HTT] that 

11^'- ~ ^''^^lli? - \\PBk + iXr - PSk + iY'^^^Wp + \\PBk+iX''^^ - PSk+iY'^^'^Wp 

(4-5) 

<'^\\PBk+iXr - Pb^ + iY 

Using the fact that b = AXr + e, F'^+i = X'' - A*{AX'' - AXr - e) = + A*iAZ'' + e). Hence, from 
KB, 



\\Xr - X'^+^Wp < 2 \\PB^^:^Xr - PBk + iY''^^\\p 

< 2 \\PB,^,Xr - Pb.+.X" - PB,^,A*AiPB,^,Z'' + {I - Pb,^,)Z'') - PB,^,A*e\\p 

< 2 \\Pb,+,Z'^ - Pb,^,A*A{Pb,^,Z^ + {I- Pb,^,)Z%p + 2 ||PB.^,-4*e||^ 

< 2 ||(/ - Pb,^,A*APb,^,)Pb,^,Z^\\p + 2 \\Pb,^,A*A{I - Ps.+J^i^ + 2 \\PB,^,A*e\\p . 

Since rank(PB,^,X) < 2r,VX e R™^", by applying in Proposition [lH we get. 



Since P<iiP^ = Pip, it follows from (|2.3p in Proposition 12.61 that the eigenvalues of the linear operator 
P^,A*AP^ are in the interval [1 — 5r{A), 1 + 5r{A)]. Letting = Bk+i, it follows that the eigenvalues of 
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PBk+i^*^PBk+i in the interval [1 — (52r(-4), 1 + (52r(-4)]- Hence the eigenvalues of / — Psk+i-^* -^Psk+i 
arc bounded above by S2r{A) and it follows that 

1 1 (/ - Ps.+, A*APb,^, , ^ < S2r (A) 1 1 Ps.+, ^ . 

Also, since Pb^Z'' = Z'', Z'' G span(Bfc) and rank(FB,ui3,+i^) < 3r,VX € R™^", by applying Proposition 
\2l\ we get 

\\Pb,^,A*A{I - Pb,+,)Z''\\^ < 63r{A) II (/ - Pb.^JZ'^Wp ■ 



Thus, since 52r{A) < S^riA), 



\X,. - X''+'\\p < 2S2r{A) WPb.^.Z'^Wp + 2S3r{A) - Ps. + J^i^ + 2v/l+d~2r(^)||e|l2 

< 2V263r{A) ll^ip + 2v/TTM^!|e||2- 



By assumption, S^riA) < l/v32; hence we have 



1 



2- 



(4.6) ||z'=+i||^<-||Z^||^ + 2.17|le 
Iterating this inequality, we get (|4.3p . 

To ensure that the recovery accuracy H-^'^Hp, < 5.34||e||2, it is sufficient to let \\Xr — -''^"H^ < l|6||2, 
which implies that when k > k* := [logg {\\Xr - ^°||^ /|!e||2)] , (031) holds. □ 

For an arbitrary matrix X, we have the following result. 

Theorem 4.3. Suppose that b ~ AX + e, where X is an arbitrary matrix, and A has the RIP with 
SsriA) < l/\/32. Let Xr be the best rank-r approximation to X. Then, at iteration k, IHT will recover an 
approximation X^ satisfying 

(4.7) ||X - X^Wp < 2-'' \\Xr - X^Wp + 5.7Ur, 
where 

(4.8) er = \\X- XrWp + ^\\X- XrW, + l|ej|2. 



is called the unrecoverable energy (see \S6f). Furthermore, after at most k* := [logj (||Xr — -'^"H^ /cr)] 
iterations, IHT estimates X with accuracy 



(4.9) 



X~X^' 



< 6.7Ur 

F 



Proof. From Theorem 14.21 with e = A{X — Xr) + e instead of e, we have 

\\Xr - X'^Wp = \\z''\\p < 2-'= \\Xr. - X'^Wp + 4.34p|l2. 
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By Proposition we know that 



||el|2 < \\A{X - Xr)\\p + \\e\\2 < ^l+6r{A} (^\\X 



XrWf + ^WX-Xrt ]+\\eh- 



ir 



Thus we have from the triangle inequality and (|4.8 



\X - X^^'ll^ < \Xr - X^ll^ + ||X - XrWp 

< 2-^ \\Xr - + 4.34||g||2 + \\X - Xr\ 



< 2-'' \\Xr - X^Wp + (4.34 v/1 + Sr{A) + l)e,. 

< 2-'' \\Xr - X"\\p + 5.7Ur- 



This proves (|4~7| . 

Furthermore, if we want \\X — X'^\\^ < 6.716^, it is sufficient to require 2^'^ \\Xr — -^^"11^ < There- 
fore, after at most k* := \\0g2 {\\^r — -'^"H^ /^r)~\ iterations, (|4.9p holds. □ 

Similar bounds on the RIP constant for an approximate recovery were obtained by Lee and Bresler 
[281 127j for affinely constrained matrix rank minimization and by Lee and Bresler for ellipsoidally constrained 
matrix rank minimization |29| . The results in Theorems 14.21 and 14.31 improve the previous results for afhnely 
constrained matrix rank minimization in [28l[27]. Specifically, Theorems 14. 21 and 14 . 31 require the RIP constant 
Sir < 1/V32 « 0.1768, while the result in [281 [27] requires < 0.04. The IHT algorithm for matrix rank 
minimization has also been independently studied by Meka, Jain and Dhillon in [34j . who have obtained 
very different results than those in Theorems 14.21 and 14.31 



5. Iterative Hard Thresholding with Matrix Shrinkage. We study another variant of FPCA in 
this section. If in each iteration of IHT, we perform matrix shrinkage to Rr(Y) with fixed thresholding 
/U, > 0, we get the following algorithm (Algorithm [3]) , which we call Iterative Hard Thresholding with Matrix 
Shrinkage (IHTMS). Note that Sf,{Rr(Y)) = K(5'^(y)), Vr, ^ and Y. 

Algorithm 3: Iterative Hard Thresholding with Matrix Shrinkage (IHTMS) 

Initialization: Given and r. 

for k ~ 0,1,. . . do 

yfc+i _ _4*(_4xfe - b). 

X^+^ := Rr{S^{Y^+^)). 



For IHTMS, we have the following convergence results. 

Theorem 5.1. Suppose that b = AX,. + e, where Xj. is a rank-r matrix, and A has the RIP with 
^3r{A) < l/v^32. Then, at iteration k, IHTMS will recover an approximation X^ satisfying 



(5.1) 



\Xr - X^Wp < 2-^ \\Xr - X°\\p 



-4.34||e||2 + 4^V^. 



Furthermore, after at most k* [log2 (|| A^r — -^^Wp I (ll^lb + iterations, IHTMS estimates Xj. with 

accuracy 



(5.2) 



< 5.34|le|l2 + 5/iv^. 
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Proof. Using the same notation as in the proof of Theorem 14.21 we know that Psk+i^r = Xr and 
Pbj.^jX'"'+^ = X^^"^ . Using the triangle inequahty we get, 



(5.3) 



+ \\Pb,^,S,{Y^+')-Pb,^,Y''+'\\p. 



Since X^'^^ is the best rank-r approximation to S^iY^'^^)^ by applying Lemma HTTl we get 



Pb,^,X''+' ~ Pb,^.S^{Y'^^')\\p < \\PB,+,Xr - Pb,+,S,{Y 



k+l^ 



(5.4) 



rk+l I 



<\\PBk+iXr - Pb^ + iY 

+ \\PB,^,S,iY'^+')-PB,^,Y'^+'\\p. 



Therefore, by combining (|5.3p . (|5.4p and noticing that 



I Pb,+, (r ) - y ^ < 1 1 5^ (r '=+1 ) -y''+'\\^ < f,v^, 



we have 



Using an identical argument following (|4.5p in the proof of Theorem 14. 2[ we get 

\\Xr - X''+^\\p < 2V263r{A) \\Z''\\p + 2Vl+^3r(^)||e||2 + 2/^7^. 
Now since (-4) < 1 / \/32 , we have 

ll^^+'IU < ^||^iF + 2.17|le||2 + 2/iV^, 

which implies that ((5?T|) holds. Hence (fO)) holds if A:* := [logj - /(IklU + Mv^))] • ^ 

For arbitrary X , we have the following results. 

Theorem 5.2. Suppose that b = AX + e, where X is an arbitrary matrix, and A has the RIP with 
SsriA) < l/-\/32. Let Xr be the best rank-r approximation to X. Then, at iteration k, IHTMS will recover 
an approximation X^ satisfying 



(5.5) 



1^ - |L < 2^'= \\Xr - X°|L + 5.71?r + 4//V^, 



where ir is defined by (|4.8p . Furthermore, after at most k* := \\0g2 {}\Xr — X^^^ /(?,. + /iy^))] iterations, 
IHTMS estimates X with accuracy 



(5.6) 



X^X 



fe* 



< 6. Tier + 5^\/rn. 



Proof. The proof of (|5.5p is identical to the proof of ()4.7p in Thcorem l4.3l except that (j5.ip is used instead 
of (gSl). It also immediately follows from (l53|) that holds for k* ~ [log2 {\\Xr - X°\\F/{er + l-i-^M))] .□ 
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6. FPCA with Given Rank r. In this section, we study the FPCA when rank r is known and the 
stepsize is always chosen as r = 1. This is equivalent to applying a continuation strategy to ^ in IHTMS. 
We call this algorithm FPCAr (see Algorithm 2] below). The parameter -q^ determines the rate of reduction 
of the consecutive jij in continuation, i.e., 

(6.1) ^j+i = max{^j?7^,/2}, j 1, . . . ,i - 1 



For FPCAr, we have the following convergence results. 



Algorithm 4: FPCA with given rank r (FPCAr) 



..> IJL = H- 



Input 

for j = 1,. . . ,L do 

Set fi = fij . 

for k ~ 0,1,. .. , until convergence do 



A* (aX^ 



Set AO^,) -^w'- 



Output: X* 



Theorem 6.1. Suppose that b ~ AXr + e, where A,, is a rank-r matrix, and A has the RIP with 
S^riA) < 1/V32. Also, suppose in FPCAr, after Kj iterations with fixed ^ = /ij, we obtain a solution A^^^^' 
that is then set to the initial point X^j^^^ for the next continuation subproblem ^ = /ij+i . Then FPCAr will 
recover an approximation A^'^^^' that satisfies 



Xr — X 



{Kl) 



<f2-^WA ||A,,-A0||^+ 2-^^'^.^' +! 4.341 



Proof. For X'^^s^''\ which is obtained by setting /i = in the first Ki iterations, we get from Theorem 



EH that iiSsriA) <l/V32 



(6.3) 



{Kl) 
(1) 



< 2-^1 llA,- A°|L + 4.34||e|l2 + 4^iV^. 



Then from iteration Ki + 1 to Ki + A2, we fix ^ = /i2. Again by Theorem 15. 1[ we get 



(6.4) 



Xr — X 



{K2 
(2) 



< 2' 



(Kl) 
(1) 



-4.34||e||2 + 4/i2V^ 



By substituting (|6.3p into (|6.4p . we get 



(6.5) 



Xr — X 



iK2) 
(2) 



<2 



'{K1+K2) 



A,-A"||^ + (2-^'^ + l) 4.34|ie||2 



(2-^MA^l^M + 4A^2^M) 
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Repeating this procedure we can get the foUowing result for X^^-^\ 



< - + |]2-^''^. ^' + 1 4.341 



6 2 



(6.6) 



0=2 



ii=2 



So as long as \xi^ is small and i^/, is large, the recovery error will be very small. □ 
For arbitrary X, we have the following convergence results. 

Theorem 6.2. Suppose that b ~ AX + e, where X is an arbitrary matrix. Let Xr be the best rank-r 
approximation to X. With the same notation and under the same conditions as in Theorem \ 6.1l FPCAr 
will recover an approximation that satisfies 



Y y'^^^'i 



<(2-^'^=i'^^^ {^2^^^=^'^' +1 j A.lUr 



where e,, is defined by 

Proof. We skip the proof here since it is similar to the proof of Theorem 14.31 □ 



7. Practical Issues. In practice, the rank r of the optimal solution is usually unknown in practice. 
Thus, in every iteration, we need to determine r appropriately. We propose some heuristics for doing this 
here. We start with r :~ r„iax- So X^ is a rank-rmax matrix. For the fc-th iteration ( fc > 2 ), r is chosen 
as the number of singular values of X''~^ which are greater than escrf"^, where is the largest singular 
value of X*"'"^ and G (0, 1) is a given tolerance. Sometimes the given tolerance truncates too many of the 
singular values, so we need to increase r occasionally. One way to do this is to increase r by 1 whenever the 
non-expansive property of the shrinkage operator is violated some fixed number of times, say 10. In the 
numerical experiments described in Section [51 we used another strategy; i.e., we increased r by 1 whenever 
the Frobenius norm of gradient g increased by more than 10 times. We tested this heuristic for determining 
r extensively. It enables our algorithms to achieve very good rccovcrability and appears to be very robust. 
For many examples, our algorithms can recover matrices whose rank is almost Tmax with a limited number 
of measurements. 

Another issue in practice is concerned with the SVD computation. Note that in IHT, IHTMS and 
FPCA, we need to compute the best rank-r approximation to y'^+i at every iteration. This can be very 
expensive even if we use a state-of-the-art code like PRO PACK [26] , especially when the rank of the matrix 
is relatively large. Therefore, we used instead the Monte Carlo algorithm LinearTimcSVD proposed in [TS] 
to approximate the best rank-r approximation. For a given matrix A G R™^", and parameters Cs,ks G Ij^ 
with 1 < kg < Cs < n and {pijf^i, Pi > 0, ^"^j^pi = 1, this algorithm returns an approximation to 
the largest fcs singular values and corresponding left singular vectors of the matrix A in 0{m -f n) time. 
The LinearTimcSVD Algorithm, which we found to be much faster than PROPACK, is outlined below in 
Algorithm [H The outputs at{C),t = l,...,ks are approximations to the largest kg singular values and 
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Hj^\ t = 1, . . . ,k are approximations to the corresponding left singular vectors of the matrix A. Thus, the 
SVD of A is approximated by 

A « Ak^ := HkBiagiaiC)){A'^ HkBmg{l/aiC)f . 

Drincas et al. [15] prove that with high probability, the following estimate holds for both ^ = 2 and £^ = F: 

(7.1) \\A-Ai,J\l< min \\A - D\\l + poly{h,l/c,)\\A\\%, 

D:i-a,nk{D)<ks 

where poly{ks,l/cs) is a polynomial in ks and 1/cs. Thus, Ak^ is an approximation to the best rank-fc^ 
approximation to A. 

Algorithm 5: Linear Time Approximate SVD Algorithm [15] 

Input : A e M™x", c,, fc, e Z+ s.t.l < ks < Cs < n, {p^}2^, s.t.p, > 0,J2tiP^ = 1- 
Output: Hk e K^x'^- and at{C),t = l,...,fcs. 
for t = 1,. . . .Cs do 

Pick G 1, . . . , n with Pr[it = a] = Pa,a = 1, . . . , n. 
_ Set C(*) = A(**)/^c^. 

Compute C^C and its SVD; say C^C = J^tU <^f{C)y'y'^. 

Compute /i* = CyyatiC) for t = 1, . . . , ks. 

Return Hk^ , where i?^*' = /i*, and at{C), t = 1, . . . , fc^- 



Note that in this algorithm, we compute an exact SVD of a smaller matrix C^C £ R^^^'^^. Thus, Cs 
determines the speed of this algorithm. If we choose a large c^, we need more time to compute the SVD 
of C^C. However, the larger Cs is, the more likely arc the at{C),t = 1, . . . , fc^ to be close to the largest 
ks singular values of the matrix A since the second term in the right hand side of (|7.ip is smaller. In our 
numerical experiments, we found that we could choose a relatively small Cs so that the computational time 
was reduced without significantly degrading the accuracy. There are many ways to choose the probabilities 
Pi. In our numerical experiments in Section [S] we used the simplest one, i.e., we set all pi equal to 1/n. For 
other choices of pi, see |15| and the references therein. 

Although PRO PACK is more accurate than this Monte Carlo method (Algorithmic]), we observed from 
our numerical experiments that our algorithms are very robust and do not rely on the accuracy of the 
approximate SVDs. 

In the j-th inner iteration in FPCA we solve problem (|1.8p for a fixed /i = /i^; and stop when 

(7-2) r, iiyfcii . < xtol, 

max{l, IIA'^IIf} 

where xtol is a small positive number. We then decrease fj. and go to the next inner iteration. 
Algorithms IHT and IHTMS are terminated when (|7.2p holds. 

8. Numerical Experiments. In this section, wc present numerical results for the algorithms discussed 
above and provide comparisons with the SDP solver SDPT3 [43]. We use IHTr, IHTMSr, FPCAr to denote 
algorithms in which the rank r is specified, and IHT, IHTMS, FPCA to denote those in which r is determined 
by the heuristics described in Section [T] We tested all of the algorithms IHT, IHTMS, FPCA and IHTr, 
IHTMSr, FPCAr on both randomly created and realistic matrix rank minimization problems (jl.ip . IHTr, 

14 



IHT, IHTMSr and IHTMS were terminated when ([T^ holds. FPCAr and FPCA were termmated when 
both (|7.2|) holds and /ife — p.. All numerical experiments were run in MATLAB 7.3.0 on a Dell Precision 
670 workstation with an Intel xeon(TM) 3.4GHZ CPU and 6GB of RAM. All CPU times reported in this 
section are in seconds. 

8.1. Randomly Created Test Problems. We tested some randomly created problems to illustrate 
the recoverability /convergence properties of our algorithms. The random test problems (jl.ip were created in 
the following manner. We first generated random matrices Ml € ]R™><'' and A/^ <S R"^'' with i.i.d. Gaussian 
entries ~ A/'(0, 1) and then set M = AIlAI]^ . We then created a matrix A € ]Rpx™" ^jth i.i.d. Gaussian 
entries Aij ~ Af{Q, 1/p). Finally, the observation b was set equal to b := Avec{M). We use SR = p/{mn), 
i.e., the number of measurements divided by the number of entries of the matrix, to denote the sampling 
ratio. We also list FR = r{m + n — r)/p, i.e. the dimension of the set of rank r matrices divided by the 
number of measurements, in the tables. Note that if FR > 1, then there is always an infinite number of 
matrices with rank r satisfying the p linear constraints, so we cannot hope to recover the matrix in this 
situation. We also report the relative error 



reZ. 



opt 



A/ll 



to indicate the closeness oi Xopt to M, where Xopt is the optimal solution to (jl.ip produced by our algorithms. 
We declared M to be recovered if the relative error was less than 10^"^. We solved 10 randomly created matrix 
rank minimization problems for each set of (m, n,p, r). We used NS to denote the number of matrices that 
were recovered successfully. The average time and average relative error of the successfully solved problems 
are also reported. 

The parameters used in the algorithms are summarized in Table [83] 

Table 8.1 
Parameters used in the algorithms 



parameter 


value 


description 


M 


io-« 


parameter in Algorithms [T] and |4] 




0.25 


parameter in (16.11) 




0.01 


parameter in LinearTimeSVD 


Cs 


2^max 2 


parameter in LinearTimeSVD 


Pi 


1/n, Vi 


parameter in LinearTimeSVD 


xtol 


io-« 


parameter in (17.21) 



We first compare the solvers discussed above that specify the rank r with the SDP solver SDPT3 j43] . 
The results for a set of small problems with m = n ~ 60, with 20 percent sampling (i.e., SR = 0.2 and p = 
720) and different ranks are presented in Table [521 Note that for this set of parameters (m, n,p), the largest 
rank that satisfies FR < 1 is r„iax = 6. 

From Table we can see that the performance of our methods is very robust. They are much faster 
and their abilities to recover the matrices are much better than SDPT3. For ranks less than or equal to 5, 
which is almost the largest rank guaranteeing FR < 1, IHTr, IHTMSr and FPCAr can recover all randomly 
generated matrices with a relative error of the order of le — 5. However, SDPT3 can only recover all matrices 
with a rank equal to 1 or 2. When the rank r increases to 3, SDPT3 can only recover 3 of the 10 matrices. 
When the rank r increases to 4 or 5, none of the 10 matrices can be recovered by SDPT3. 

To verify the theoretical results in Sections 4, 5 and 6, we plotted the approximation errors between 
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Table 8.2 

Comparison between IHTr, IHTMSr and FPCAr with SDPT3 



Prob 


SDPT3 


IHTr 


IHTMSr 


FPCAr 


r 


FR 


NS 


time 


rel.err. 


NS 


time 


rel.err. 


NS 


time 


rel.err. 


NS 


time 


rel.err. 


1 


0.17 


10 


122.93 


2.31e-10 


10 


2.60 


1.67e-05 


10 


2.59 


1.67e-05 


10 


4.63 


9.00e-06 


2 


0.33 


10 


124.26 


3.46e-09 


10 


4.97 


1.99e-05 


10 


4.98 


2.11e-05 


10 


6.06 


1.51e-05 


3 


0.49 


3 


149.74 


2.84e-07 


10 


10.04 


2.38e-05 


10 


9.95 


2.27e-05 


10 


10.64 


2.35C-05 


4 


0.64 









10 


22.99 


2.88e-05 


10 


22.72 


3.05e-05 


10 


23.29 


2.93e-05 


5 


0.80 









10 


75.86 


3.89e-05 


10 


84.13 


3.95e-05 


10 


79.46 


3.94e-05 




100 200 300 400 500 

iteration 



Fig. 8.1. Approximation error versus the iteration number for a problem where the rank equaled 2 

and X* versus the iteration number for algorithms IHTr, IHTMSr and FPCAr in Figure [5TT] for the results 
on a particular matrix of rank 2. From this figure, we can see that logHX*^ — Ar*||i? is approximately a 
linear function of the iteration number k. This implies that our theoretical results in Sections 4, 5 and 6 
approximately hold in practice. 

For the same set of test problems, Tables 18.31 18.41 and 18.51 present comparisons of IHTr versus IHT, 
IHTMSr versus IHTMS and FPCAr versus FPCA. 



Table 8.3 
Comparison between IHTr and IHT 



Prob 


IHTr 


IHT 


r 


FR 


NS 


time 


rel.err. 


NS 


time 


rel.err. 


1 


0.17 


10 


2.60 


1.67e-05 


10 


4.24 


1.74e-05 


2 


0.33 


10 


4.97 


1.99e-05 


10 


7.00 


1.92e-05 


3 


0.49 


10 


10.04 


2.38e-05 


10 


13.27 


2.32e-05 


4 


0.64 


10 


22.99 


2.88e-05 


10 


28.06 


2.93e-05 


5 


0.80 


10 


75.86 


3.89e-05 


10 


96.32 


4.00e-05 



From these tables we see that by using our heuristics for determining the rank r at every iteration, 
algorithms IHT, IHTMS and FPCA perform similarly to algorithms IHTr, IHTMSr and FPCAr which make 
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Table 8.4 

Comparison between IHTMSr and IHTMS 



Prob 


IHTMSr 


IHTMS 


r 


FR 


NS 


time 


rcl.crr. 


NS 


time 


rcl.err. 


1 


0.17 


10 


2.59 


1.67e-05 


10 


3.98 


1.77e-05 


2 


0.33 


10 


4.98 


2.11e-05 


10 


6.95 


2.04e-05 


3 


0.49 


10 


9.95 


2.27e-05 


10 


12.65 


2.30e-05 


4 


0.64 


10 


22.72 


3.05e-05 


10 


27.12 


2.86e-05 


5 


0.80 


10 


84.13 


3.95e-05 


10 


94.13 


4.10e-05 



Table 8.5 
Comparison between FPCAr and FPCA 



Prob 


FPCAr 


FPCA 


r 


FR 


NS 


time 


rel.err. 


NS 


time 


rel.err. 


1 


0.17 


10 


4.63 


9.00e-06 


10 


4.66 


8.88e-06 


2 


0.33 


10 


6.06 


1.51e-05 


10 


6.15 


1.55e-05 


3 


0.49 


10 


10.64 


2.35e-05 


10 


11.50 


2.24e-05 


4 


0.64 


10 


23.29 


2.93e-05 


10 


25.66 


2.88e-05 


5 


0.80 


10 


79.46 


3.94e-05 


10 


83.91 


3.87e-05 



use of knowledge of the true rank r. Specifically, algorithms IHT, IHTMS and FPCA are capable of recovering 
low-rank matrices very well even when we do not know their rank. 

Choosing r is crucial in algorithms IHTr, IHTMSr and FPCAr as it is in greedy algorithms for matrix 
rank minimization and compressed sensing. In Table 18.61 we present results on how the choice of r affects 
the performance of algorithms IHTr, IHTMSr and FPCAr when the true rank of the matrix is not known. 
In Table 18. 6[ the true rank is 3 and the results for choices of the rank from 1 to 6 are presented. The rows 
labeled IHT, IHTMS and FPCA present the results for these algorithms which use the heuristics in Section 
[7] to determine the rank r. From Table 18.61 we see that if we specify a rank that is smaller than the true 
rank, then all of the algorithms IHTr, IHTMSr and FPCAr are unable to successfully recover the matrices 
(i.e., the relative error is greater than le-3). Specifically, since for the problems tested the true rank of the 
matrix was 3, the algorithms failed when r was chosen to be either 1 or 2. If the chosen rank is slightly 
greater than the true rank (i.e., the rank was chosen to be 4 or 5), all the three algorithms IHTr, IHTMSr 
and FPCAr still worked. However, the relative errors and times were much worse than those produced by 
the heuristics based solvers IHT, IHTMS and FPCA. When the chosen rank was too large (i.e., was chosen 
to be 6), IHTr, IHTMSr and FPCAr were only able to recover the matrices in 4, 1 and 3 out of 10 problems, 
respectively. However, IHT, IHTMS and FPCA always recovered the matrices. 

8.2. A Video Compression Problem. We tested the performance of our algorithms on a video 
problem. By stacking each frame of the video as a column of a large matrix, we get a matrix M whose j-th 
column corresponds to the j-th frame of the video. Due to the correlation between consecutive frames of 
the video matrix M is expected to be of low-rank. Hence we should be able to recover the video by only 
taking a limited number of measurements. The video used in our experiment was downloaded from the 



website http://media.xiph.org/video/derf The original colored video consisted of 300 frames where each 
frame was an image stored in the RGB format, which was a 144 x 176 x 3 cube. Since it was too large for 
our use, we preprocessed the video in the following way. We first converted each frame from RGB format 
into grayscale image, so each frame was a 144 x 176 matrix. We then used only that portion of each frame 
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Table 8.6 

Comparison when the given rank is different from the true rank of 3 



Given rank 


NS 


time 


rel.err. 


IHTr 


1 









2 









3 


10 


10.04 


2.38e-05 


4 


10 


21.42 


3.42e-05 


5 


10 


63.53 


5.51e-05 


6 


4 


109.00 


4.44e-04 


IHT 


10 


13.27 


2.32e-05 


IHTMSr 


1 









2 









3 


10 


9.95 


2.27e-05 


4 


10 


22.53 


3.40e-05 


5 


10 


67.89 


5.93e-05 


6 


1 


116.62 


6.04e-04 


IHTMS 


10 


12.65 


2.30e-05 


FPCAr 


1 









2 









3 


10 


10.64 


2.35e-05 


4 


10 


21.26 


3.46e-05 


5 


10 


63.67 


5.99e-05 


6 


3 


108.02 


4.04e-04 


FPCA 


10 


11.50 


2.24e-05 



corresponding to a 39 x 47 submatrix of pixels in the center of each frame. We took the first 20 frames 
in the video. Consequently, the matrix M has m ~ 1833 rows and n = 20 cohimns. We then created a 
Gaussian sampHng matrix A € Rp^(™") as in Section \8A\ with p ~ 1833 * 20 * 0.4 = 14664 rows (i.e., we used 
sampHng ratio SR = 0.4) and computed b = Avec{M) e MP. This 14664 x 36660 matrix A was close to the 
size limit of what could be created by calling the MATLAB function A = randn{p, mn) on our computer. 
Although the matrix M is expected to be of low-rank, it is only approximately of low-rank. Therefore, 
besides comparing the recovered matrices with the original matrix A/, we also compared them with the best 
rank-5 approximation of M . Since the relative error of the best rank-5 approximation of M is 2.33e — 2, we 
cannot expect to get a more accurate solution. Therefore, we set xtol equal to 0.002 for this problem. The 
results of our numerical tests are reported in Table 18.71 The relative errors and CPU times are averaged 
over 5 runs. We do not report any results for SDPT3, because the problem is far too large to be solved by 
an SDP solver. Note that the relative error of the best rank-5 approximation of M is 2.33e — 2, thus from 
Table 18.71 we see that our algorithms were able to recover the matrix M very well and achieve an order of 
accuracy as the best rank-5 approximation. 

We show three frames of video in Figure 18.21 The three images in the first column correspond to the 
frames in the original video. The images in the second column correspond to the frames in the rank-5 
approximation matrix of the video. The images in the third column correspond to the frames in the matrix 
recovered by FPGA. The other five solvers recovered images that were very similar visually to FPCA so we 
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Table 8.7 
Results on recovery of compressed video 




Solvers 


rank 


rel.err. 


time 


IHTr 


5 


6.87e-2 


645 


IHT 


5 


9.76e-2 


949 


IHTMSr 


5 


6.72e-2 


688 


IHTMS 


5 


9.69C-2 


804 


FPCAr 


5 


5.10e-2 


514 


FPCA 


5 


5.17e-2 


1296 




(b) 





f3 El 

Fig. 8.2. Comparison of frames 4, 12 and 18 of (a) the original video, (b) the best rank-5 approximation and (c) the 
matrix recovered by FPCA 

do not show them here. From Figure |8^ we see that FPCA recovers the video very well by taking only 40% 
of the measurements. 
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Appendix. Here we give proofs of Propositions 12. 6[ 12.71 and 12.81 
Proof of Proposition 12.61 

Proof. We prove ([221) first. Since for any X £ M™><", rank(PvtX) < r, we have 



\{X,P^A*b)\ = \{AP^X,b)\ 

< \\AP^X\\2\\b\\2 

< ^l + 5,{A)\\P^X\\F\\b\\, 



<Vl + 3r{A)\\XMbh. 
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Thus 



\\P^A*h\\F= max \{X,P^,A*b)\< ^/mAA)\\b\\2. 

\\X\\f = 1 



To prove 



note that by the RIP, 

(1 - SriAMP^XWl < WAP^Xfp < (1 + S,.{A))\\P^X\\l, 



which means the eigenvalues of P-q,A*AP^i restricted to span(^') are in the interval [1 — 6r{A), 1 + (5r(.4)]. 
Thus holds. □ 



Proof of Proposition 12.71 First, we will prove 

(A-1) \{A{I - P^)X,AP^Y)\ < 6r{A)\\{I - P^)X\\f\\P^Y\\f,'^Y e R™^",X e span(*')- 

holds obviously if (/ - P^s,)X = or P^Y = 0. Thus we can assume (/ - P^s,)X ^ and P^,Y ^ 0. 

X = 1, y = 1 and = 0. Since 



Define X 



\\{I-P<,)X\\i 



and Y 



Wp^yWf ' 



then we have 



X e span(\l> U ^I/') and Y e span(vl'), we have rank (^X + Y^ <r and rank (jc -Y^ < r. Hence by RIP, 



2(1 - 6r{A)) = (1 - SriA)) 



X + Y 



< 



AX + AY 



< {l+dr{A)) 



X + Y 



2{l + SriA)). 



and 



2(1 ~ Sr{A)) = {1 - Sr{A)) 



X-Y 



< 



AX -AY 



< il + dr{A)) 



X-Y 



2{l + 5riA)). 



Therefore we have 



{AX, AY) = 



AX + AY 



AX -AY 



< Sr{A) 



and 



'{AX, AY) 



AX- AY 



AX + AY 



< SriA). 



Thus, \{AX,AY) \ < Sr{A) and (|^^ holds. 
Finally we have, for any X E span(^''), 

\\P^A*A{I - P^)X\ 



= max \{P^A*A{I - P^)X,Y)\ 

11^ II F — 1 

= max \{A(I - P4,)X,AP^Y)\ 

||y||F=i 

<Sr{A)\\iI-P^)X\\p, 
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i.e., (|2.4p holds, which completes the proof. □ 



Proof of Proposition 12.81 This proof essentially follows that given by Needell and Tropp in [3 

Proof. Let B' := {X e M™^" : rank(X) = s, \\X\\f < 1} be the unit ball of rank-s matrices in 
Define the convex hull of the unit norm matrices with rank at most r as: 

S conv < |J > C M'"^". 



By (|2.5p . wc know that the operator norm 



Define another convex set 



\\A\\s^2^max\\AX\\2 < ^l + Sr{A). 



K:^{X(^ K™^" : \\X\\f + ^||^||* < 1} C 

\ r 



and consider the operator norm 



1^I1a'^2 = max \\AX\\2. 
xeK 



The content of the proposition is the claim that K C S. 

Choose a matrix X £ K with SVD X = UDiag{a-)V^ . Let Iq index the r largest components of cr, 
breaking ties lexicographically. Let Ii index the next largest r components, and so forth. Note that the final 
block I J may have fewer than r components. We may assume that a\i- is nonzero for each j. This partition 
induces a decomposition 

J J 
X = U[Dmgia\iJ + ^ Diag(a|,^. )]^^ - ^oYo + XjY,, 

where Xj = ||?7Diag((T|7^ and Yj = \J^UDia,g{a\i^)V^ . By construction, each matrix Yj belongs to S 
because it's rank is at most r and it has unit Frobenius norm. We will prove that Xj < 1, which implies 
that X can be expressed as a convex combination of matrices from the set S. So X Cz S and K C S. 

Fix j in the range {1,2,..., J}. It follows that a\j. contains at most r elements and cr|/^._j contains 
exactly r elements. Therefore, 

A, = |kUJ|2 < V^lkUJU < V^. i||a|,^.„J|i. 
Summing these relations, we obtain, 

1 1 

Y,x,<—\\a\j^_,\\i<-^mu. 

j=i V V 
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It is obvious that Ao = jlo-|/|-|j|2 < H-'^Hf- Wc now conclude that 



Y^x^<\\x\\f + ^\\x\u<i 

because X E K. This imphes that X G S and K C S, and thus completes the proof. □ 
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